Skip to content

[tmva][sofie] Restructure emitted code to be differentiable with Clad#18332

Merged
guitargeek merged 2 commits intoroot-project:masterfrom
guitargeek:sofie_ad
Mar 30, 2026
Merged

[tmva][sofie] Restructure emitted code to be differentiable with Clad#18332
guitargeek merged 2 commits intoroot-project:masterfrom
guitargeek:sofie_ad

Conversation

@guitargeek
Copy link
Copy Markdown
Contributor

@guitargeek guitargeek commented Apr 9, 2025

The idea of this commit is to refactor the doInfer() function that implements the inference from a member function of the Session struct to a free function that takes the Session by const-reference.

This free function should only use the session struct and bare C-style arrays, so that Clad will have no problem differentiating it.

A unit test for the differentiation of a simple MLP is implemented, embedded in the existing SOFIE tests.

For illustration of the changes, here is how the layout of the code emitted for the Linear_16 unit tests looks like before and after this PR:

Before:

struct Session {

   Session(std::string filename = "Linear_16.dat");
   void doInfer(float const *tensor_input1, std::vector<float> &output_tensor_39)
   {
      // operator code here
   }

   std::vector<float> infer(float const *tensor_input1)
   {
      std::vector<float> output_tensor_39;
      doInfer(tensor_input1, output_tensor_39);
      return {output_tensor_39};
   }
};

After:

struct Session;

inline void doInfer(Session const &session, float const *tensor_input1, float *tensor_39);

struct Session {

   Session(std::string filename = "Linear_16.dat");

   std::vector<float> infer(float const *tensor_input1)
   {
      std::vector<float> output_tensor_39(160);
      doInfer(*this, tensor_input1, output_tensor_39.data());
      return {output_tensor_39};
   }
};

inline void doInfer(Session const &session, float const *tensor_input1, float *tensor_39)
{
   // operator code is here
}

One side-benefit of this refactor is that users now have a generated inference function that doesn't imply manual memory allocation of the output to be in a std::vector, but just takes a C-style output array. The existing Session::infer() signature is unchanged for full backwards compatibility.

@guitargeek guitargeek self-assigned this Apr 9, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2025

Test Results

    22 files      22 suites   3d 3h 49m 52s ⏱️
 3 834 tests  3 833 ✅  1 💤 0 ❌
76 577 runs  76 559 ✅ 18 💤 0 ❌

Results for commit aa52030.

♻️ This comment has been updated with latest results.

@guitargeek guitargeek force-pushed the sofie_ad branch 8 times, most recently from 6b90cb6 to 87597cd Compare April 15, 2025 06:33
@guitargeek
Copy link
Copy Markdown
Contributor Author

guitargeek commented Apr 22, 2025

Proof of concept test for this PR

Take this ONNX file (remove the .txt suffix after downloading):

VRlL_real_500k_evts_model.onnx.txt

Here are the scripts to convert the model to C++ and then to differentiate it with Clad:

// onnx_to_cpp.C

void onnx_to_cpp()
{
   using namespace TMVA::Experimental;
   SOFIE::RModelParser_ONNX parser;
   SOFIE::RModel model = parser.Parse("./VRlL_real_500k_evts_model.onnx");
   model.SetOptimizationLevel(SOFIE::OptimizationLevel::kBasic);
   model.Generate();
   model.PrintRequiredInputTensors();

   model.OutputGenerated("./VRlL_real_500k_evts_model.hxx");
}
// sofie_ad.C

#include "VRlL_real_500k_evts_model.hxx"

#include <Math/CladDerivator.h>

using Sess = TMVA_SOFIE_VRlL_real_500k_evts_model::Session;

// Wrapper functions for Clad
float my_func(Sess const &session, float const *tensor_x, float *tensor_theory_params)
{
   float out = 0.;
   TMVA_SOFIE_VRlL_real_500k_evts_model::doInfer(session, tensor_x, tensor_theory_params, &out);
   return out;
}

float my_func_wrapper(Sess const &session, float const *tensor_x, float *tensor_theory_params)
{
   return my_func(session, tensor_x, tensor_theory_params);
}

void sofie_ad()
{
   // Let's go Clad!
   clad::gradient(my_func_wrapper, "tensor_theory_params");

   // Get a function pointer to the pullback. If you are unsure what the
   // signature is, try to cast the pullback to some function pointer, like
   // static_cast<void (*)(float)>(my_func_pullback) in the interpreter, and
   // the compiler will tell you what the real signature is.
   using Grad_t = void (*)(const Sess &, const float *, float *, float, Sess *, float *);

   // Get the functions from the interpreter (remove semicolor to get the code printed)
   auto grad = reinterpret_cast<Grad_t>(gInterpreter->ProcessLine("my_func_pullback;"));

   std::vector<float> input1{5.0, 2.0, 1.0, -1.0, 1.0};
   std::vector<float> input2{0.0};

   // A trick: pre-allocate session struct both for the forward pass and
   // backward pass, to that no memory allocation of intermediate tensors has
   // to happend in the gradient.
   TMVA_SOFIE_VRlL_real_500k_evts_model::Session s("VRlL_real_500k_evts_model.dat");
   TMVA_SOFIE_VRlL_real_500k_evts_model::Session d_s("VRlL_real_500k_evts_model.dat");

   // Calculate numerical gradient
   auto numDiff = [&](int i) {
      const float eps = 1e-4;
      std::vector<float> p{input2};
      p[i] = input2[i] - eps;
      float funcValDown = my_func(s, input1.data(), p.data());
      p[i] = input2[i] + eps;
      float funcValUp = my_func(s, input1.data(), p.data());
      return (funcValUp - funcValDown) / (2 * eps);
   };

   for (std::size_t i = 0; i < input2.size(); ++i) {
      std::cout << i << ":" << std::endl;
      std::cout << "  numr : " << numDiff(i) << std::endl;
   }

   // Calculate gradient with Clad
   float grad_output[]{0., 0., 0., 0., 0.};
   grad(s, input1.data(), input2.data(), 1.0, &d_s, grad_output);

   std::cout << "  clad : " << grad_output[0] << std::endl;
}

Usage with expected output (replace libblas.so location with relevant path for your system):

root [1] .x onnx_to_cpp.C
Model requires following inputs:
Fully Specified Tensor name: theory_params	type: float	shape: [1]
Fully Specified Tensor name: x	type: float	shape: [5]

root [2] .x sofie_ad.C
0:
  numr : -0.531077
  clad : -0.532437
root [3] .q

@guitargeek guitargeek force-pushed the sofie_ad branch 3 times, most recently from 89b638c to a3d545f Compare May 7, 2025 14:42
@guitargeek guitargeek changed the title [TMVA][SOFIE] Restructure emitted code to be differentiable with Clad [tmva][sofie] Restructure emitted code to be differentiable with Clad May 7, 2025
@guitargeek guitargeek force-pushed the sofie_ad branch 2 times, most recently from 3f40542 to 78fcc20 Compare May 8, 2025 09:12
@guitargeek guitargeek force-pushed the sofie_ad branch 2 times, most recently from 4c9920f to 97903fa Compare July 15, 2025 14:52
@guitargeek guitargeek closed this Aug 5, 2025
@guitargeek guitargeek deleted the sofie_ad branch August 5, 2025 17:14
@vgvassilev
Copy link
Copy Markdown
Member

Why did we decide to not pursue this?

@guitargeek
Copy link
Copy Markdown
Contributor Author

guitargeek commented Aug 11, 2025

@vgvassilev, sorry that was totally an accident. Maybe I confused it with another PR, or I wanted to close and re-open the PR to run the tests, but apparently I missed the "reopen" button.

@guitargeek guitargeek restored the sofie_ad branch August 11, 2025 08:20
@guitargeek guitargeek reopened this Aug 11, 2025
@guitargeek guitargeek force-pushed the sofie_ad branch 7 times, most recently from 4084f5b to d1dfa3f Compare March 12, 2026 08:59
@guitargeek guitargeek force-pushed the sofie_ad branch 2 times, most recently from 0434fc8 to 02aa923 Compare March 13, 2026 16:15
@guitargeek guitargeek force-pushed the sofie_ad branch 6 times, most recently from d99553f to aa52030 Compare March 28, 2026 21:10
@guitargeek guitargeek marked this pull request as ready for review March 28, 2026 21:13
@vgvassilev
Copy link
Copy Markdown
Member

Is there any way we can easily compare the performance and memory footprint against say PyTorch?

@guitargeek
Copy link
Copy Markdown
Contributor Author

Yes, I'm working on that. So far, the generated gradient is not competitive because it's not optimized. I'll have to throw a few more pragma clad checkpoint loop in the generated code, for example.

But these optimizations can be better done in a separate PR. I also need to follow up with a RooONNXFunction class for easy use in RooFit. Performance optimizations only comes after that.

Copy link
Copy Markdown
Member

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

Copy link
Copy Markdown
Member

@lmoneta lmoneta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thank you Jonas for this very useful addition to SOFIE.
Apart from a minor thing, I have just a comment on the test, whether is better to keep the diff test separate from the others

…n Session ctor"

This reverts commit 1f747b0.

The reason for the revert is that it's actually useful to have the
maximum dynamic tensor size as a datamember of the Session, because then
we can refactor the generated code such that it can be differentiated
with Clad.
The idea of this commit is to refactor the `doInfer()` function that
implements the inference from a member function of the `Session` struct
to a free function that takes the `Session` by `const`-reference.

This free function should only use the session struct and bare C-style
arrays, so that Clad will have no problem differentiating it.

A unit test for the differentiation of a simple MLP is implemented,
embedded in the existing SOFIE tests.
@guitargeek
Copy link
Copy Markdown
Contributor Author

CI failure is unrelated. The SOFIE Keras parser tests fail in all PRs on alma8 today, almost certainly because of the update from NumPy 2.4.3 to 2.4.4

@guitargeek guitargeek merged commit 3387080 into root-project:master Mar 30, 2026
27 of 29 checks passed
@guitargeek guitargeek deleted the sofie_ad branch March 30, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants